Near-optimal PAC bounds for discounted MDPs
نویسندگان
چکیده
منابع مشابه
Near-optimal PAC bounds for discounted MDPs
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends lin...
متن کاملPAC Bounds for Discounted MDPs
We study upper and lower bounds on the sample-complexity of learning nearoptimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends line...
متن کاملPAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates
PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates by Jason Pazis Department of Computer Science Duke University Date: Approved: Ronald Parr, Supervisor Vincent Conitzer George Konidaris Mauro Maggioni Peter Stone An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the...
متن کاملTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.
متن کاملNear-Optimal Interdiction of Factored MDPs
Stackelberg games have been widely used to model interactions between attackers and defenders in a broad array of security domains. One related approach involves plan interdiction, whereby a defender chooses a subset of actions to block (remove), and the attacker constructs an optimal plan in response. In previous work, this approach has been introduced in the context of Markov decision process...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Theoretical Computer Science
سال: 2014
ISSN: 0304-3975
DOI: 10.1016/j.tcs.2014.09.029